AITopics | swin transformer

Collaborating Authors

swin transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

480150047ecb2187a3a8b8dccfd8f2de-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 08:57:44 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

a37fea8e67f907311826bc1ba2654d97-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 03:01:05 GMT

image super-resolution, swinir, urban100, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)

Add feedback

726204cea3ec27790a644e5b379175e3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 18:52:18 GMT

computational pathology, dataset, wsis, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Sichuan Province > Chengdu (0.05)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Automatic Pith Detection in Tree Cross-Section Images Using Deep Learning

Liao, Tzu-I, Fakhry, Mahmoud, Varghese, Jibin Yesudas

arXiv.org Artificial IntelligenceDec-2-2025

Pith detection in tree cross-sections is essential for forestry and wood quality analysis but remains a manual, error-prone task. This study evaluates deep learning models -- YOLOv9, U-Net, Swin Transformer, DeepLabV3, and Mask R-CNN -- to automate the process efficiently. A dataset of 582 labeled images was dynamically augmented to improve generalization. Swin Transformer achieved the highest accuracy (0.94), excelling in fine segmentation. YOLOv9 performed well for bounding box detection but struggled with boundary precision. U-Net was effective for structured patterns, while DeepLabV3 captured multi-scale features with slight boundary imprecision. Mask R-CNN initially underperformed due to overlapping detections, but applying Non-Maximum Suppression (NMS) improved its IoU from 0.45 to 0.80. Generalizability was next tested using an oak dataset of 11 images from Oregon State University's Tree Ring Lab. Additionally, for exploratory analysis purposes, an additional dataset of 64 labeled tree cross-sections was used to train the worst-performing model to see if this would improve its performance generalizing to the unseen oak dataset. Key challenges included tensor mismatches and boundary inconsistencies, addressed through hyperparameter tuning and augmentation. Our results highlight deep learning's potential for tree cross-section pith detection, with model choice depending on dataset characteristics and application needs.

artificial intelligence, detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2512.00625

Country: North America > United States > Oregon (0.36)

Genre: Research Report > New Finding (0.88)

Industry: Materials > Paper & Forest Products (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Application of Graph Based Vision Transformers Architectures for Accurate Temperature Prediction in Fiber Specklegram Sensors

Sebastian, Abhishek

arXiv.org Artificial IntelligenceNov-20-2025

Fiber Specklegram Sensors (FSS) are highly effective for environmental monitoring, particularly for detecting temperature variations. However, the nonlinear nature of specklegram data presents significant challenges for accurate temperature prediction. This study investigates the use of transformer-based architectures, including Vision Transformers (ViTs), Swin Transformers, and emerging models such as Learnable Importance Non-Symmetric Attention Vision Transformers (LINA-ViT) and Multi-Adaptive Proximity Vision Graph Attention Transformers (MAP-ViGAT), to predict temperature from specklegram data over a range of 0 to 120 Celsius. The results show that ViTs achieved a Mean Absolute Error (MAE) of 1.15, outperforming traditional models such as CNNs. GAT-ViT and MAP-ViGAT variants also demonstrated competitive accuracy, highlighting the importance of adaptive attention mechanisms and graph-based structures in capturing complex modal interactions and phase shifts in specklegram data. Additionally, this study incorporates Explainable AI (XAI) techniques, including attention maps and saliency maps, to provide insights into the decision-making processes of the transformer models, improving interpretability and transparency. These findings establish transformer architectures as strong benchmarks for optical fiber-based temperature sensing and offer promising directions for industrial monitoring and structural health assessment applications.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.14792

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Consumer Health (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis

Peltekian, Alec K., Senkow, Karolina, Durak, Gorkem, Grudzinski, Kevin M., Bemiss, Bradford C., Dematte, Jane E., Richardson, Carrie, Markov, Nikolay S., Carns, Mary, Aren, Kathleen, Soriano, Alexandra, Dapas, Matthew, Perlman, Harris, Gundersheimer, Aaron, Selvan, Kavitha C., Varga, John, Hinchcliff, Monique, Warrior, Krishnan, Gao, Catherine A., Wunderink, Richard G., Budinger, GR Scott, Choudhary, Alok N., Esposito, Anthony J., Misharin, Alexander V., Agrawal, Ankit, Bagci, Ulas

arXiv.org Artificial IntelligenceNov-19-2025

Interstitial lung disease (ILD) is a leading cause of morbidity and mortality in systemic sclerosis (SSc). Chest computed tomography (CT) is the primary imaging modality for diagnosing and monitoring lung complications in SSc patients. However, its role in disease progression and mortality prediction has not yet been fully clarified. This study introduces a novel, large-scale longitudinal chest CT analysis framework that utilizes radiomics and deep learning to predict mortality associated with lung complications of SSc. We collected and analyzed 2,125 CT scans from SSc patients enrolled in the Northwestern Scleroderma Registry, conducting mortality analyses at one, three, and five years using advanced imaging analysis techniques. Death labels were assigned based on recorded deaths over the one-, three-, and five-year intervals, confirmed by expert physicians. In our dataset, 181, 326, and 428 of the 2,125 CT scans were from patients who died within one, three, and five years, respectively. Using ResNet-18, DenseNet-121, and Swin Transformer we use pre-trained models, and fine-tuned on 2,125 images of SSc patients. Models achieved an AUC of 0.769, 0.801, 0.709 for predicting mortality within one-, three-, and five-years, respectively. Our findings highlight the potential of both radiomics and deep learning computational methods to improve early detection and risk assessment of SSc-related interstitial lung disease, marking a significant advancement in the literature.

artificial intelligence, ct scan, machine learning, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-032-07904-6_2

2509.2353

Country:

North America > United States > Michigan (0.28)
North America > United States > Illinois > Cook County (0.15)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

Rîpanu, Cătălin-Alexandru, Hotnog, Andrei-Theodor, Imbrea, Giulia-Stefania, Cercel, Dumitru-Clementin

arXiv.org Artificial IntelligenceNov-18-2025

Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.12767

Country: Europe > Romania (0.15)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

2D Representation for Unguided Single-View 3D Super-Resolution in Real-Time

Mas, Ignasi, Huerta, Ivan, Morros, Ramon, Ruiz-Hidalgo, Javier

arXiv.org Artificial IntelligenceNov-12-2025

We introduce 2Dto3D-SR, a versatile framework for real-time single-view 3D super-resolution that eliminates the need for high-resolution RGB guidance. Our framework encodes 3D data from a single viewpoint into a structured 2D representation, enabling the direct application of existing 2D image super-resolution architectures. We utilize the Projected Normalized Coordinate Code (PNCC) to represent 3D geometry from a visible surface as a regular image, thereby circumventing the complexities of 3D point-based or RGB-guided methods. This design supports lightweight and fast models adaptable to various deployment environments. We evaluate 2Dto3D-SR with two implementations: one using Swin Transformers for high accuracy, and another using Vision Mamba for high efficiency. Experiments show the Swin Transformer model achieves state-of-the-art accuracy on standard benchmarks, while the Vision Mamba model delivers competitive results at real-time speeds. This establishes our geometry-guided pipeline as a surprisingly simple yet viable and practical solution for real-world scenarios, especially where high-resolution RGB data is inaccessible.

artificial intelligence, machine learning, real time system, (18 more...)

arXiv.org Artificial Intelligence

2511.08224

Genre: Research Report (0.40)

Technology:

Information Technology > Architecture > Real Time Systems (0.92)
Information Technology > Artificial Intelligence > Vision (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT

Bendazzoli, Simone, Tzortzakakis, Antonios, Abrahamsson, Andreas, Wahlin, Björn Engelbrekt, Smedby, Örjan, Holstensson, Maria, Moreno, Rodrigo

arXiv.org Artificial IntelligenceNov-11-2025

Early cancer detection is crucial for improving patient outcomes, and 18F FDG PET/CT imaging plays a vital role by combining metabolic and anatomical information. Accurate lesion detection remains challenging due to the need to identify multiple lesions of varying sizes. In this study, we investigate the effect of adding anatomy prior information to deep learning-based lesion detection models. In particular, we add organ segmentation masks from the TotalSegmentator tool as auxiliary inputs to provide anatomical context to nnDetection, which is the state-of-the-art for lesion detection, and Swin Transformer. The latter is trained in two stages that combine self-supervised pre-training and supervised fine-tuning. The method is tested in the AutoPET and Karolinska lymphoma datasets. The results indicate that the inclusion of anatomical priors substantially improves the detection performance within the nnDetection framework, while it has almost no impact on the performance of the vision transformer. Moreover, we observe that Swin Transformer does not offer clear advantages over conventional convolutional neural network (CNN) encoders used in nnDetection. These findings highlight the critical role of the anatomical context in cancer lesion detection, especially in CNN-based models.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.07047

Country: Europe > Sweden (0.16)

Genre: Research Report > New Finding (0.89)

Industry: